Requisitos

envs/01_qc.yml

Inputs:

Outputs:

Set the seed

set.seed(3980)

Timing of Script

start_time <- Sys.time()
start_time
## [1] "2026-02-06 16:06:59 -05"

Load Packages

library(htmltools)

Goals of file:

  1. Cargar secuencias crudas.
  2. Analizar la calidad con NanoPlot
  3. Correr qc con filtlong
  4. Comprobar calidad postqc con nanoplot

NanoPlot 1

conda activate nanoplot 
NanoPlot --version 
NanoPlot 1.44.1
NanoPlot --fastq Data/prueba.fastq.gz --outdir Analysis/01_qc/nanoplot

Resultados pre_qc

cat(readLines("Analysis/01_qc/nanoplot_pre_qc/NanoStats.txt"), sep ="\n")
## General summary:         
## Mean read length:                3,905.0
## Mean read quality:                   8.4
## Median read length:              3,140.0
## Median read quality:                 9.5
## Number of reads:                40,000.0
## Read length N50:                 5,093.0
## STDEV read length:               3,269.0
## Total bases:               156,199,195.0
## Number, percentage and megabases of reads above quality cutoffs
## >Q10:    10460 (26.2%) 45.4Mb
## >Q15:    0 (0.0%) 0.0Mb
## >Q20:    0 (0.0%) 0.0Mb
## >Q25:    0 (0.0%) 0.0Mb
## >Q30:    0 (0.0%) 0.0Mb
## Top 5 highest mean basecall quality scores and their read lengths
## 1:   11.9 (258)
## 2:   11.7 (975)
## 3:   11.7 (3671)
## 4:   11.7 (2385)
## 5:   11.7 (1252)
## Top 5 longest reads and their mean basecall quality score
## 1:   40044 (9.7)
## 2:   35177 (8.1)
## 3:   34668 (9.7)
## 4:   34637 (10.5)
## 5:   33834 (9.5)
htmltools::includeHTML("Analysis/01_qc/nanoplot_pre_qc/Yield_By_Length.html")
htmltools::includeHTML("Analysis/01_qc/nanoplot_pre_qc/WeightedHistogramReadlength.html")
htmltools::includeHTML("Analysis/01_qc/nanoplot_pre_qc/LengthvsQualityScatterPlot_kde.html")

Dado los resultados se decidio min calidad de 7 y min largo de 1000

Filtlong

conda deactivate
conda activate filtlong
filtlong --version
Filtlong v0.2.1
filtlong --min_length 1000 --min_mean_q 7 prueba.fastq.gz | gzip > prueba.L1000.Q7.fastq.gz

Nanoplot 2

conda deactivate
conda activate nanoplot 
NanoPlot --version 
NanoPlot 1.44.1
NanoPlot --fastq Data/prueba.L1000.Q7.fastq.gz --outdir Analysis/01_qc/nanoplotpostqc

resultados post_qc

cat(readLines("Analysis/01_qc/nanoplot_post_qc/NanoStats.txt"), sep ="\n")
## General summary:         
## Mean read length:                4,276.8
## Mean read quality:                   8.7
## Median read length:              3,424.0
## Median read quality:                 9.5
## Number of reads:                35,977.0
## Read length N50:                 5,155.0
## STDEV read length:               3,240.3
## Total bases:               153,867,890.0
## Number, percentage and megabases of reads above quality cutoffs
## >Q10:    10065 (28.0%) 45.1Mb
## >Q15:    0 (0.0%) 0.0Mb
## >Q20:    0 (0.0%) 0.0Mb
## >Q25:    0 (0.0%) 0.0Mb
## >Q30:    0 (0.0%) 0.0Mb
## Top 5 highest mean basecall quality scores and their read lengths
## 1:   11.7 (3671)
## 2:   11.7 (2385)
## 3:   11.7 (1252)
## 4:   11.6 (2680)
## 5:   11.6 (1854)
## Top 5 longest reads and their mean basecall quality score
## 1:   40044 (9.7)
## 2:   35177 (8.1)
## 3:   34668 (9.7)
## 4:   34637 (10.5)
## 5:   33834 (9.5)
htmltools::includeHTML("Analysis/01_qc/nanoplot_post_qc/Yield_By_Length.html")
htmltools::includeHTML("Analysis/01_qc/nanoplot_post_qc/WeightedHistogramReadlength.html")
htmltools::includeHTML("Analysis/01_qc/nanoplot_post_qc/LengthvsQualityScatterPlot_kde.html")

Resultados post_qc bla bla